-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add atomic_ref
support for 8 and 16b types.
#2255
Conversation
…ms to be invalid though
🟩 CI finished in 4h 38m: Pass: 100%/417 | Total: 3d 08h | Avg: 11m 37s | Max: 1h 16m | Hits: 79%/34092
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 417)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
61 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
libcudacxx/include/cuda/std/__atomic/functions/cuda_ptx_derived.h
Outdated
Show resolved
Hide resolved
🟨 CI finished in 7h 41m: Pass: 98%/417 | Total: 2d 16h | Avg: 9m 13s | Max: 43m 42s | Hits: 97%/34092
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 417)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
61 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
libcudacxx/include/cuda/std/__atomic/functions/cuda_ptx_derived.h
Outdated
Show resolved
Hide resolved
🟨 CI finished in 6h 55m: Pass: 98%/421 | Total: 2d 07h | Avg: 7m 52s | Max: 56m 07s | Hits: 98%/34092
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
🏃 Runner counts (total jobs: 421)
# | Runner |
---|---|
305 | linux-amd64-cpu16 |
65 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
🟨 CI finished in 12h 23m: Pass: 95%/437 | Total: 2d 22h | Avg: 9m 40s | Max: 59m 39s | Hits: 97%/41584
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 437)
# | Runner |
---|---|
320 | linux-amd64-cpu16 |
66 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
libcudacxx/include/cuda/std/__atomic/functions/cuda_ptx_derived.h
Outdated
Show resolved
Hide resolved
libcudacxx/include/cuda/std/__atomic/functions/cuda_ptx_derived.h
Outdated
Show resolved
Hide resolved
🟨 CI finished in 3h 50m: Pass: 99%/437 | Total: 2d 22h | Avg: 9m 41s | Max: 1h 20m | Hits: 90%/41584
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 437)
# | Runner |
---|---|
320 | linux-amd64-cpu16 |
66 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
23 | windows-amd64-cpu16 |
const uint32_t __attempt = (__old & __windowMask) | __opOffset; | ||
|
||
if (__cuda_atomic_compare_exchange( | ||
__aligned, __old, __old, __attempt, _Order{}, __atomic_cuda_operand_b32{}, _Sco{})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will generate release/acquire/acq_rel/seq_cst fences within the inner loop, but it'd suffice to issue the right fences before/after the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you mistaking the __atomic_cuda_compare_exchange
frontend with this call? This dispatches purely to PTX.
…m in the DAG to fix CI failures
🟩 CI finished in 47m 15s: Pass: 100%/370 | Total: 1d 22h | Avg: 7m 27s | Max: 47m 13s | Hits: 80%/25912
|
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
pycuda | |
CUDA C Core Library |
Modifications in project or dependencies?
Project | |
---|---|
+/- | CCCL Infrastructure |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | pycuda |
+/- | CUDA C Core Library |
🏃 Runner counts (total jobs: 370)
# | Runner |
---|---|
297 | linux-amd64-cpu16 |
30 | linux-amd64-gpu-v100-latest-1 |
28 | linux-arm64-cpu16 |
15 | windows-amd64-cpu16 |
@gonzalobg can you approve? |
Description
This enables the
atomic_ref
APIs to begin accepting 8 and 16b types. These types are emulated by 32 bit wide CAS loops.Performance is terrible and program correctness is based entirely on whether the surrounding memory is valid and atomically accessed.
closes #2051
Checklist